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This paper describes numerical experiments with P-multigrid to corroborate analysis, 
validate the present implementation, and to examine issues that arise in the implemen- 
tations of the various combinations of relaxation schemes, discretizations and P-multigrid 
methods. The two approaches to implement P-multigrid presented here are equivalent 
for most high-order discretization methods such as spectral element, SUPG, and discon- 
tinuous Galerkin applied to advection; however it is discovered that the approach that 
mimics the common geometric multigrid implementation is less robust, and frequently un- 
stable when applied to discontinuous Galerkin discretizations of diffusion. Gauss-Seidel 
relaxation converges ~ 40% faster than block Jacobi, as predicted by analysis; however, 
the implementation of Gauss-Seidel is considerably more expensive that one would ex- 
pect because gradients in most neighboring elements must be updated. A compromise 
quasi Gauss-Seidel relaxation method that evaluates the gradient in each element twice per 
iteration converges at rates similar to those predicted for true Gauss-Seidel. 

I. Introduction 

The discontinuous Galerkin (DG) method has recently been applied to a range of fluid models from 
simple linear acoustics 1-4 to complete Navier-Stokes , 5-8 as well as related models in electromagnetics . 9, 10 
Much of the interest in DG stems from two inherent properties of the method. First, DG’s finite-element 
underpinnings provides a rigorous and robust means of discretizing partial differential equations on un- 
structured grids; and the use of unstructured grids is widely thought to be the most effective approach to 
modeling complex geometries. Second, and in contrast to traditional finite-element methods, the use of local 
discontinuous basis functions results in a local mass matrix and leads to a compact discretization. These in- 
herent properties of DG lead to a highly compact discretization that is well suited to propagation dominated 
problems that are efficiently resolved by explicit time marching methods. 

Most of DG’s advantageous features (accurate, robust, compact) persist when applied to partial differ- 
ential equations having a diffusive or elliptic component, such as the Navier-Stokes equations. However, in 
such cases, the DG discretization becomes very stiff , 11 and the use of an explicit time marching method 
is expensive. As a consequence, the DG method with explicit time marching is non-competitive for most 
problems involving diffusion, and is intractable in cases having a broad range of spatial scales such as high 
Reynolds number flows. While DG has been successfully applied to the Navier-Stokes using explicit time 
marching, these instances have been limited to cases with low Reynolds numbers . 5-7, 12 

Efforts to develop efficient implicit solvers for DG have included optimized relaxation schemes , 11 analysis 
of traditional geometric multigrid , 13 GMRES , 14-16 and P-multigrid . 17-20 Analysis of geometric multigrid 
indicates mesh that independent results are possible; however, the required restriction operators are complex 
and would be difficult to implement for general unstructured grids. A large effort in GMRES can be found in 
open literature; however the method incurs a large storage penalty, and none have achieved mesh independent 
convergence for high Reynolds number flows (or other highly stiff cases). P-multigrid is the most promising 
of these approaches, as well as the simplest to apply. The method provides mesh independent convergence 
and can be implemented with little storage or work overhead. 
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This paper describes numerical experiments with P-multigrid that corroborate the analysis presented in 
the companion paper, 20 and validate the present implementation. This paper will also discuss attributes 
of the various combinations of relaxation schemes, discretization methods, and P-multigrid methods that 
become apparent when considering the implementation of these methods. The first section of this paper 
describes the model problem and several variations of the DG discretization that will be considered in this 
study. The second section describes block Jacobi and Gauss-Seidel relaxation schemes, and discusses issues 
that arise in the implementation of the Gauss-Seidel method. The third section describes P-multigrid and 
two approaches for its implementation. The last section presents numerical experiments with the various 
discretizations and relaxation schemes. 

II. Model Problem, DG Discretizations 


A. Model Problem 

The Poisson equation is written as a set of first order equations, 


V 2 m = f(x) 


<7 — Vm = 0 
— V • 8 + f(x) = 0, 


(1) 


defined on the domain Cl, where u denotes the solution to the Poisson equation, a: is a point in Cl, a is the 
gradient vector, and f(x) is a given forcing function. 

The domain is partitioned into a collection of non-overlapping elements Cli , where the subscript l uniquely 
identifies an arbitrary element. Let Cl n denote an arbitrary element neighboring to Cli, let dCl^ n denote the 
edge segment between elements Cli and Cl n . For elements that lie on domain boundaries, let the boundary 
edge segments be denoted as dCli t b when they need to be distinguished. Finally, let {e;} denote the set of 
all edge segments of Cli, including any that may lie on the domain boundary. 


B. DG Discretization 


The following describes the discretizations in terms of a single arbitrary element. The solution in each element 
is approximated in terms of a set of local discontinuous basis functions </> = {&;,?:} as u « ui = Xq*} ^l,i u l,i 
and <7 ~ S( e Xqi} i R ft- (Note: 4> is used to denote either a set of basis functions, or a column vector 

of basis functions depending on the context.) Typically, the local basis set for a given element is taken to be 
the complete set of polynomials of degree less that or equal to some value p for points inside of the element, 
and zero for points outside the element. This choice of basis functions allows the discrete equations to be 
developed for a single element, instead of as a global summation of integrals as is commonly encountered 
in finite element methods. The governing equations are projected onto each member of the basis set and 
integrated by parts to obtain a set of equations in an arbitrary element that govern the set of unknown 
variables {iq^} in that element. 


j [bi,j(Ti + uiX7bi ti \ dCl - j bi ti hi(ui,Un)nds 

J [C/b L idi + f(x)} dCl - ^2 J di,u n , a n ) • nds 


{ei } dfli,- 


V» 

(2) 

v». 

(3) 


n is the outward unit normal. The integrands of the integrals over element boundaries are replaced by 
numerical fluxes that account for the discontinuous nature of the solution there. There are several variations 
of DG discretizations; however, it is shown in reference 21 that most can be written in a unified form, and 
that all differ only by the way in which the numerical fluxes are defined. 

The present work will focus on two forms: the local discontinuous Galerkin method 22 (LDG), and the 
interior penalty method 23,24 (IP). They are defined by: 


LDG: hi(ui,u n ) = {|w|} - $,„[«], 

h 2 (u^a l ,u n ,a n ) = {|cf|} + (di, n {d\ ~ aijn\u\, 


(4) 
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hi(ui,u n ) = {|u|} , 
h 2 {ui,ai,u n ,a n ) = {|Vw|} - oyn[w 


( 5 ) 


IP : 


where {|tc|} = (wi + w n )/2 denotes the average of a quantity, and [re] = w n — u>i denotes the jump of a 
quantity. f3^ n is a parameter in the range — ± < (3i^ n <a that effectively allows the operator to switch from 
a symmetric centered operator (/3; >n = 0), to a one-sided operator (/3; >n = ±i with /3; >n = The term 

ay is a numerical penalty term that is required for stability when /3; >n = 0, and it is defined as ay = i) ft -1 , 
where p > 0 and h is a mesh scale. For the IP method, must be increased when the order p is increased in 
order to maintain stability. Discretizations similar to LDG(/3;, n = 0) and IP, introduced by Brezzi et al. 25 
and Bassi et al. 6 respectively, are obtained by replacing aj fit] with an integral function of the solution jump 
a r (H). This form automatically scales with order to give the required stabilization for any p, although a 
user specified coefficient is usually included in the formulation. LDG is usually applied in one of two distinct 
forms: either /3; >n = 0 and rj > 0, or and p = 0. The former will be denoted as LDG-central and 

the later as LDG-one-sided. 


III. Relaxation Schemes 

For the purpose of discussing relaxation schemes, equations 2 and 3 are rewritten in the primal form 
obtained by eliminating dy from equation 3. 

R(U) = AU l + ]T B n U n = 0. (6) 

n£{nj} 

A and B n are matrices, Ui is a vector containing the solution coefficients {it;}, and {n{\ is the set of all 
elements in the stencil except 12;. It is worth noting that, for the IP discretization, {n;} contains only 
the nearest neighboring elements, but for LDG-centered, {n{\ additionally contains all the neighbors of the 
nearest neighbors. For the LDG-one-sided discretization on Cartesian grids, {n/} also contains only the 
nearest neighboring elements, but for non-Cartesian grids there is a weak coupling to some, but not all, of 
the neighbors of the nearest neighbors. 

A. Block Jacobi 

The block Jacobi (BJ) relaxation is defined as 

jjn+1 = un _ u) A~ 1 R(U n ). (7) 

An earlier analysis 11 of BJ applied to DG showed that BJ groups all of the eigenvalues to be less than two 
for all p , thus u) = 1 is always stable; however, a slight under relaxation of 0.95 can improve the robustness 
in some cases. This will be discussed more in the next section. 

B. Block Gauss-Seidel 

Gauss-Seidel (GS) is a sweeping algorithm which can be written in a form similar to BJ. The only difference 
is that the R(U) is evaluated just prior to the update, using most recent data available in all neighboring 
elements. 

ru+ 1 = jjn _ u A - 1 R(U n , U n+1 ) (8) 

Although it would appear that GS could be implemented for the same cost as BJ, in fact, this is not the 
case. It is most efficient to implement DG in the two distinct steps defined by equations 2 and 3, instead 
of directly implementing the primal form. However, the gradient flux ai becomes out of date each time a 
neighbor is updated. As a result, a n must be reevaluated in almost all neighboring elements, as well as the 
local element, before R(U n , U n+1 ) can be properly evaluated and the solution updated. The implementation 
tested below strikes a compromise in that the gradient flux is evaluated only twice in each element: once 
just before the update, and again immediately after the update. For this reason, the scheme will be labeled 
quasi-GS. It should be noted that for LDG-one-sided on a Cartesian grid, one particular sweep direction 
will result in true GS. Alternate sweeping patterns, such as a checkerboard pattern, can also eliminate this 
difficulty. 
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It is interesting to note that GS can be applied to the IP discretization without an additional cost 
penalty. Because the numerical flux depends only on gradients obtained by differentiating the solution, the 
contribution to the numerical flux from the solution in f 2/ is not affected by an update occurring in any 
neighboring element. However, it should also be noted that the IP scheme always computes a gradient twice 
in each element, although two different methods are used. 

One last point concerning the comparisons between analysis and the numerical experiments described in 
a later section is that the analysis presented in reference 20 is a Fourier analysis, and thus, is purely periodic. 
However, strictly speaking, GS, is not a periodic operator. In the numerical test, the explicit component of 
the spatial operator is treated periodically on the computational domain. However, the sweeping pattern of 
GS corresponds to a lower triangular matrix. Thus, some differences might be expected between the analysis 
and numerical test for GS. 


IV. P-multigrid 


P-multigrid is an iterative algorithm in which the convergence of a relaxation scheme applied to a set of 
high-order equations is improved by solving a low-order subset of the equations. The algorithm is applied 
recursively to the low-order set until the lowest order of p c = 0 or 1 is reached. At this point, P-multigrid 
can be combined with geometric multi-grid (grid coarsening) to solve the lowest order set of equations. 

The approach was first proposed by Rcmquist and Patera 26 for spectral element discretizations of elliptic 
equations, and further analyzed by Maday and Munoz. 27 Helenbrook 17 applied P-multigrid to the Navier- 
Stokes equations using a SUPG discretization, and coupled P-multigrid with geometric multigrid for the first 
time. 

Typically, the low-order subset is of degree p c = pi 2. For hierarchical bases, those for which the low- 
order basis set <f> c is a proper subset of the high-order basis set, the low-order set of correction equations is a 
subset of the high-order equations. The low order equations are solved for the subset of solution coefficients 
associated with low-order basis while holding the remaining components constant. For any </> c in the space 
spanned by (j), the prolongation operator is given formally by 


Ip, Pc 

where 




Lo, 


(9) 


The restriction operator is the transpose of the prolongation operator. If L p U p = F p denotes the set of 
high-order equations of degree p , then the low-order correction equation set is given by 


tT T TT tT T? 

1 V,PcV U V ~ 1 P,Pc r P' 


(10) 


In spite of its complex appearance, the restriction operator for an ordered hierarchical basis vector is simply 


= M] 


( 11 ) 


where I is an N Pc XN Pc identity matrix, 0 is an N Pc XN p - Pc null matrix, and N x denotes the number of 
members in a basis set of degree x. This approach will be referred to as algebraic P-multigrid, because the 
low-order equation set is constructed directly from the higlr-order set. 

An alternate formulation is obtained by employing the methodology commonly used in geometric multi- 
grid. That is, given a code that implements a relaxation scheme on L p U p = F p for arbitrary p , then that 
code can easily be used to solve a correction equation that is of the form 


L Pc U Pc = F(U P ,U P J, (12) 

where F(U P , U Pc ) is a forcing term given by 

F(U P , U p J = L p Jl p U p - I pPc (L p U p - F p ). (13) 

Because of its similarity to traditional geometric multigrid methods, this approach will be referred to as the 
rediscretization P-multigrid method. 
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Equations 12 and 13 are equivalent to 10 provided L Pc = I pPc L p I pPc . That is, the low-order operator 
is equal to the upper left block of the high-order operator. This property holds for methods such as SUPG, 
spectral element, and DG applied to strict first order equations (advection and propagation). However, it is 
not true for most DG discretizations of second derivative operators. Consider, for example, the A matrix of 
equation 6 . In the case of one-dimensional LDG-one-sided with a monomial basis, the matrices are 


A p =q — [— 2 ] , 


A p — i 


-8 -1 

-1 -2 

1! 

to 

II 

1 

L J 

L 


-18 

-1 

-7/2 

-1 

-9/2 

-3/4 

-7/2 

-3/4 

-23/4 


For LDG-central with monomial basis, the matrices are: 


A p — o — [ — 1/2 — 2rj \ , A p —i 


—29/8 — 2 ? 7 0 

0 -13/32-77/2 


(14) 


(15) 


Ap = 2 — 


-117/32 + 277 

0 

11/128 - 77/2 

0 

-213/128- 77/2 

0 

- 11/120 - 77/2 

0 

-95/1536 - 77/8 


(16) 


where 77 is the coefficient of the penalty term. Note that in each case, the lower order matrix does not equal 
the upper left-block of the high order matrix. Finally, for IP with monomial basis the problem is not so 
severe. The matrices are: 


A p =o — [ 2 77 ] , A p ~i 


-277 0 

0 1/2-77/2 


, A p —2 — 


277 

0 

-1 - 77/2 

0 

1/2 - 77/2 

0 

-77/2 

0 

1/12-77/8 


(17) 


For the principle part of the discretization, the low order matrix does equal the upper-left block of the high 
order matrix. The same is true of the contribution from the penalty terms; however recall that the coefficient 
of the penalty term should increase with p in order to maintain stability. However, if 77 is frozen at the high- 
order value while formulating the low order correction, then the low order equations with the appropriate 
forcing terms are equivalent to the low order component of the high order equations. For methods using the 
a r ([[/]) penalty term 6,25 the scaling is built in, and the matrices cannot be made equivalent by adjusting 
an explicit constant. 

Numerical experiments with the LDG-one-sided found the rediscretization P-multigrid method to be 
highly unstable. The analysis in reference 20 confirms this result; and further shows that most discretizations 
are less robust, and that many are unstable, when rediscretization P-multigrid is applied. 

The algebraic form of P-multigrid is not only more robust, but it can also be more efficient to implement; 
although it requires some special considerations in the code design. Recall that the formal restriction operator 
is essentially a trivial operator that simply selects a subset of the high-order equation set. This suggests it is 
possible to solve the low-order set “in-place” ; that is, to use the existing data arrays, and modify the code to 
operate only on selected subsets of the data. This eliminates the need to allocate any additional storage and 
to copy data to and from that storage. From this prespective, it is appropriate to view algebraic P-multigrid 
as simply a relaxation in p-space in which some equations are solved while holding some variables constant. 

The analysis 20 assumes that the lowest order set of equations are solved exactly. In the numerical test, 
the lowest order set is solved to some finite level of convergence. In most cases, the same relaxation scheme 
applied to the highest order equations is also applied to all lower order equations, but a sufficiently large 
number of iterations is applied to the lowest order set such that those equations are effectively solved. 

As mentioned earlier, analysis 11 has shown that BJ bounds the eigenvalues of LDG-one-sided to be less 
than two (and GS to less that one). However, the same is not true of the low-order subset generated by 
algebraic P-multigrid, and some under relaxation can improve robustness of block Jacobi. In the numerical 
test in the next section, block Jacobi uses a relaxation of 1.0 at the highest level but 0.95 on lower levels. 

For the remainder of this paper, the high-order set of equations will be referred to as the high-p equations 
and the low-order set of correction equations will be referred to as the low-p equations. 
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V. Numerical Test 


P-multigrid is evaluated for the Laplace equation defined on a 2D unit domain 0 < x, y < 1 with periodic 
boundaries. The domain is partitioned by an NXN Cartesian grid. The initial solution is broadband with 
significant energy in the highest portion of the spectrum visible on any grid. 

u(x,y) = F(2x)F(2y) + F(Nx)F(Ny) (18) 

where F(s) = exp(cos(7rs) — 1) 

Figure 1 illustrates a ID version of the above function for N=16. Such an initial solution was found to be 



Figure 1. Representative form (ID) of initial solution for N=16. 


necessary because of the two level testing approach used in most of the numerical experiments. Although the 
longest wavelength usually dominates the asymptotic convergence rate, its convergence behavior is governed 
primarily by the low-order equations which are solved exactly. Thus a two grid test initialized with only 
long wavelengths will converge artificially fast. 

Although accuracy is not a focus of this study, the solution error is examined in each case to verify 
that the method is converging with the expected order property. The equations are forced with f{x) = 
—2 * w 2 cos (lux) cos {toy) so that the exact solution is non-trivial. 

Unless otherwise noted, the first several results are for the LDG-one-sided discretization with p = 4. 
Figure 2(a) shows the convergence of BJ for a 2-level P-multigrid scheme in which the residual of the low-p 
equations are converged to 0.001 below the residual of the high-p equations. The effective spectral radius of 
convergence a is « 0.71, and agrees well with the analysis’ 20 prediction of 0.73. Furthermore, the asymptotic 
convergence rate is independent of mesh size. 

The analysis 20 assumes the low-p equations are solved exactly. However, in practice, the low-p equations 
will be only approximately solved. Figure 2(b) shows the convergence of the low-p equations when the 
convergence criterion is 0.001. To examine the effect of only approximately solving the low-p equations, as 
well as to establish that the low-p equations are adequately solved in the above result, test are performed 
in which the convergence criterion on the low-p equations is increased to 0.01, 0.1, and 0.5. For 0.01 and 
0.1 (not shown) the convergence rates of the high-p equations are nearly identical to those given above. In 
the case with 0.5, shown in figure 3(a) and (b), the asymptotic rate is still approximately 0.71, although the 
extremely rapid convergence that occurred over the first few iterations in the earlier case is not present, and 
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Figure 2. Convergence of 2-level P-multigrid using block Jacobi: in ( a ) IHh, line only; cr, line with symbol. 




Figure 3. Convergence of 2-level P-multigrid using block Jacobi with relaxed low-p convergence criterion: in 
(a) | |e| I 2 ? line only; cr, line with symbol. 
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the convergence rates on the coarsest meshes are slightly erratic. Comparing figures 2(b) with 3(b), we note 
that the total number of iterations on the low-p equations, and hence the cost, is not greatly reduced by 
increasing the convergence criterion from 0.001 to 0.5, as one might have expected. A convergence criterion 
of 0.01 is used in the remaining results. 

Figure 4(a) shows the results for GS applied to LDG-one-sided for the particular sweep direction that 
produces true GS. The numerical convergence rate approaches « 0.585 which is very close to the predicted 
value of 0.58. Figure 4(b) gives the convergence history of the finest grid for all possible sweep directions. The 
variation in convergence with sweep direction is minimal, and the quasi-GS scheme presented here performs 
similar to the true GS scheme. 



Sweep Direction 



Figure 4. Convergence of 2-level P-multigrid using Gauss-Seidel: (a) 1 1 e 1 1 2 ? line only; a, line with symbol, (b) 
finest grid for all sweep directions. 


Table 1 gives the convergence rates for BJ and quasi-GS using both LDG-central and LDG-one-sided. 
Also shown is the convergence rates predicted by the analysis given in 20. The agreement is good in every 
case but LDG-central with BJ. The reason for this discrepency is not known. Quasi-GS agrees well with the 
analysis for GS for both LDG-central and LDG-one-sided. 

The analysis 20 predicts that the convergence rate of the 2-level scheme is about the same as that of the 
V-cycle. Figures 5 and 6 are for a V-cycle in which the minimum order is set at p c = 1. For the case shown 
in figure 5, one iteration is performed at all levels but that of the lowest-order level, which was solved to 
a convergence criterion of 0.01. In the numerical test, the convergence rate increases to « 0.78, which is 
greater than that of the two-level cycle. However, increasing the number of iterations on the intermediate 
levels to 2, shown in figure 6(a), improves the convergence rate to slightly better than that of the 2-level 
cycle. Figure 6(b) shows similar results for GS applied to LDG-one-sided. 

Although the two- level scheme with p = 1, p c = 0 (not shown) converges well; V-cycles with p c = 0 and 
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scheme 


BJ, LDG-central 

GS, LDG-central 

BJ, LDG-one-sided 

GS, LDG-one-sided 

Numerical 

0.6 

0.47 

0.71 

0.58 

Analysis 

0.75 

0.51 

0.73 

0.58 


Table 1. Comparison of convergence rates with analysis of 20. 



Figure 5. Convergence of V-cycle P-multigrid using block Jacobi with one iteration at intermediate levels: 
1 1 e| 1 2 ? line only; cr, line with symbol. 




(a) (b) 

Figure 6. Convergence of V-cycle P-multigrid using with 2 relaxation steps on intermediate orders: 1 1 e 1 1 2 9 line 
only; cr, line with symbol; (a) block Jacobi, (b) Gauss-Seidel 
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p > 2 converge poorly (er = 0.94 on an N=32 grid and is mesh dependent.) This suggests an approach in 
which an embedded cycle is applied to converge the p = 1 level as illustrated in figure 7(a). Figure 7(b) shows 
that applying an embedded cycle with 40 iterations gives results identical to that obtained when p c = 1. 




(a) (b) 


Figure 7. (a) Embedded V-cycle to solve p = 1; (b) Convergence of V-cycle P-multigrid using Gauss-Seidel 

with 2 relaxation steps on intermediate orders and an embedded cycle with 40 iterations at p = 1, ||e|| 2 , line 
only; cr, line with symbol. 


VI. Conclusions 

Numerical tests with P-multigrid confirm that the scheme behaves essentially as the analysis of reference 20 
predicts. It is confirmed that GS does converge better than BJ, but the numerical test reveals that GS can be 
considerably more expensive, and more complicated to implement, that one might expect. Tests with various 
sweep directions suggest that the quasi-GS scheme presented here converges as well as the true GS method. 
It is found that the rediscretization P-multigrid is unstable for LDG-one-sided, as predicted in the analysis. 
However, the algebraic form of P-multigrid can be easily implemented by solving the low order equation 
subset “in place”, reusing much of the the existing storage and algorithms(code). Tests in which the low-p 
convergence criterion varied suggested that the low-p equations do not need to be exactly solved; however, 
increasing the criterion did not significantly reduce the overall work. A V-cycle with p c = 1 converges at 
a rate similar to the two-level cycle. Reducing p c to zero results in poor convergence. However solving the 
p = 1 level with an embedded cycle restores the overall convergence rate to that of a V-cycle with p c = 1. 
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